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Q_i, Abstract. This paper investigates the number of quantum queries made to solve the problem of 

reconstructing an unknown string from its substrings in a certain query model. More concretely, 
^ ■ the goal of the problem is to identify an unknown string 5 by making queries of the following form: 

^ , "Is s a substring of 5?", where s is a query string over the given alphabet. The number of queries 

O^l required to identify the string S is the query complexity of this problem. 

First we show a quantum algorithm that exactly identifies the string S with at most |A^ + o(A^) 
queries, where N is the length of 5. This contrasts sharply with the classical query complexity A^. 
Our algorithm uses Skiena and Sundaram's classical algorithm and the Grover search as subroutines. 
■ To make them effectively work, we develop another subroutine that finds a string appearing only 

^ : once in 5, which may have an independent interest. We also prove two lower bounds. The first 

' one is a general lower bound of which means we cannot achieve a query complexity of 

[ 0{N^~^) for any constant e. The other one claims that if we cannot use queries of length roughly 

CN ' between log and 31ogA^, then we cannot achieve a query complexity of any sublinear function 

in N. 



^ ■ 1 Introduction 



For an input of length N, we usually assume that the time complexity of any algorithm A is at least 
N, since A needs N steps only to read the input. However, especially recently, there have been 
increasing demands for studying algorithms that run in significantly less than N steps by sacrificing 
the exactness of the computation. In this case, we obviously need some mechanism for algorithms 
to obtain the input, since it is no longer possible to read all the input bits sequentially. Oracles are 
a popular model for this purpose. The most standard oracle is so-called an index oracle, a mapping 
/ from {0, 1, . . . , A^ — 1} into {0, 1} such that f{i) returns the ith bit of the input. Thus, we need N 
oracle calls in order to get all the input bits. A little surprisingly, however, some Boolean functions 
can be computed, with high success probability, using oracle calls much less than N times. For 
example, a balanced AND-OR tree can be computed with 0(A°'^^^'") oracle calls with high success 
probability [23] , 

This interesting fact becomes even more impressive if we are allowed to use quantum oracles. 
Due to the famous Grover search [15], we need only 0(\/iV) oracle calls to compute the Boolean- 
OR function with high success probability, or a quadratic speed-up against its classical version 
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(classically we need il,{N) calls). This result is widely known as one of the two most remarkable 
examples claiming the superiority of quantum computation over classical computation (the other 
is Shor's integer factorization algorithm |22j). 

To compute the Boolean-OR, it suffices to find at least one true value in the input bits. The 
oracle identification problem, or the string reconstruction problem, is more general and more difficult, 
namely it requires us to recover all the bits of the input (thus any Boolean function can be 
computed without any additional oracle calls). The quantum index oracle is still nontrivially 
powerful for this problem; Ref. [9] shows that N/2 + 0{\/N) oracle calls are enough for this 
problem, while we obviously need queries in the classical counterpart. There are different types 
of oracles that are much more powerful for this most general problem. The quantum IP oracle [6], 
a function g from {0, 1}^ into {0, 1} such that g{q) = q ■ x for the input string x, needs only one 
oracle call to recover x while its classical counterpart N oracle calls. Recently, Ref. [TTj studied 
the balance oracle, which models the balance scale to be used for the counterfeit coin problem (i.e., 
for finding the k counterfeit coins in N coins), and shows its quantum version can be solved with 
0{k}-/^) oracle calls while the classical version requires VL{k\og{N / k)) calls, where k is the number 
of I's in X. 

In 1993, Skiena and Sundaram [21] showed that N + Q{^/N) (classical) queries are sufficient to 
reconstruct the hidden string x if we use a substring oracle or an S-oracle, in short. This oracle, 
h{q), which returns 1 if the query string g' is a substring of x, and otherwise, had been quite 
popular in the algorithm community. For example it plays an important role in computational 
biology such as sequencing by hybridization [lOl UHl [l9] . One should notice that there is no obvious 
way (even regardless of its efficiency) of using this oracle for string reconstruction {h{q) probably 
returns yes almost always if \q\ (the length of q) is short, say two or three, and no almost always 
if Ig'l is, say, 10). Thus Skiena and Sundaram's result was highly appreciated, whose basic idea 
is as follows: Suppose that we already know that a substring s exists in the input x. Then we 
ask the oracle if si is a substring. If the answer is yes, we can increase the length of a confirmed 
substring by one. Otherwise, we know sO is a substring or s is at the right end of x. Just assume 
the former and check the latter occasionally and we can get the above bound. It is almost tight 
information-theoretically. 

Now here is our question in this paper: Is quantum also more powerful than classical computa- 
tion for this oracle, and how much is it if yes? One might say the answer is easy: Instead of asking 
if si is a substring, we ask which of xOO, xOl, xlO and xll is a substring using the 1/4-Grover 
search [?]■ Since 1/4-Grover needs just one query, we can increase the confirmed substring by two 
per call, or we would get a roughly N/2 upper bound. Unfortunately it immediately turns out that 
this does not work, since more than one of the four candidates may be (correct) substrings of x at 
the same time (recall that 1/4-Grover only works for a unique solution). 

Our Contribution. Here is our main result in this paper: 

Theorem 1 The quantum query complexity for identifying S-oracle is at most ^N +0{\/N \og N) . 

Therefore, the quantum algorithm is better than its classical counterpart by a factor of 3/4. 
Notice that our algorithm is exact as well as the classical one in |21j . To cope with the difficulty 
mentioned above, we use Skiena and Sundaram's algorithm until the confirmed substring gets to a 
certain length, then change our algorithm to the one based on 1/4-Grover. There still exists the 
possibility of multiple solutions, say sOO and sOl, but now we can assume that s is pretty long 
or those two strings need to overlap if they are both solutions. This gives us a lot of information 
about the string s, which basically changes the problem into a certain kind of string manipulation 
problem that has a long history in theoretical computer science. By using this information, we 
construct the procedure which makes the situation that 1/4-Grover is useful. 

Our strong conjecture is that our problem needs at least a linear number of queries. Our basic 
idea is to use the quantum adversary method [3l l25] , but it turns out that the fact that there is a 
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wide range (one to N) in the length of query strings makes its direct apphcation hard. We bypass 
this difficulty with two different approaches: The first one is to introduce a new query model, an 
anchored substring oracle, which is something between our substring oracle and the standard index 
oracle and makes it possible to exploit the basic ideas of the adversary method for the latter. This 
gives us the following theorem. See Appendix [B] for the proof. 



This theorem means that there are no algorithms with a query complexity of N ^ for any positive 
constant e. The second one is to prohibit a small range of length for available queries. 
Theorem 3 Suppose that we cannot use queries of length log — 1 — 2 log log N to 3 log N. Then 
the problem of identifying an S-oracle needs Q{N) queries. 

This theorem says that we need to use queries of the range of length between log — 1 — 2 log log A^ 
and Slog A^ "effectively" to achieve a sublinear bound. See Appendix [C] for the details. 

Related Work. There have been many studies achieving quantum linear speedups. As men- 
tioned already, a most celebrated one is due to van Dam [S], who presented a quantum algorithm 
for identifying the oracle by ^ + 0{\/lV) queries. This is optimal up to a constant factor since the 
lower bound ^ was obtained by Ambainis [1]. Another example is ordered search, that is, to find 
a target in a sorted list of A^ items. Farhi et al. [H] invented a quantum algorithm that makes at 
most clog A^ queries with c ~ 0.53 (note that any classical algorithm needs at least log A^ queries), 
and the constant c was subsequently improved [II1[5]- These linear speedups were also turned out 
to be tight (up to a constant factor) by the lower bound results in [21 \TE[ [12] which improved the 
previous lower bounds of [U [13] . 

There are no quantum studies based on substring oracles, and few ones about string manip- 
ulation previously. One of them is a quantum algorithm given by Ramesh and Vinay [20] which 
determines if a given pattern appears in a given text by combining Grover's search with a classical 
string matching technique called deterministic sampling. 

2 Upper Bounds 

Now we give the definition of our oracle model. We call it a substring oracle, or simply an S-oracle. 

Definition 1 A substring oracle, or an S-oracle, in short, is a binary string x = xq---xn-i S 
{0, 1}^. A query to an S-oracle is given as a string s G U^i{0) l}'^- answer from the S-oracle 
is a binary value xi^] s) defined as follows: If x has s as substring, that is, there exists an integer 
i such that Xi+k-i = ■Sfc for all 1 < k < \s\ then x{x', s) = 1 and otherwise s) = 0- -^'^ the 
quantum computation an S-oracle is viewed as the unitary transformation Os,x that transforms 
\s)\a) to \s)\a®x{x;s)). 

To give the proof of Theorem [H we define some notations on strings. The string representing 
the concatenation of strings u and v will be denoted uv. When z = uv, we call u a prefix of z 
and call v a suffix of z. A string v is called a presuffix of a string w if is a prefix of w and also 
a suffix of w. The string formed by concatenating i copies of z will be denoted z*. A string t is 
called the periodic string of a string a if t is the shortest string such that Oj = t(j mod \t\) foi' ^ 
(or, equivalently, t is the shortest string such that a can be written as a = t'^b for some integer k 
and some prefix b of t). 

2.1 Basic Ideas and Algorithms 

Before the full description of our algorithm, we present the basic idea. The algorithm has three 
main steps. At the first step, we use Skiena and Sundaram's algorithm [21] . which extends a 
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substring in the oracle string x by one letter with one query. At the second step, we extend the 
substring z obtained by the first step to a string Zout so that Zout can appear only once in x. Note 
that the first and second steps are implemented classically. The third step is quantum: we apply 
Grover's search algorithm [15] under the special case that is called 1/4- Grower search [?]■ Recall 
that the 1 /4-Grover search can find a solution surely with only one query in the case when we know 
there is only one solution out of four candidates. Since the second step assures that the substring 
Zout appears only once in x, there is exactly one substring of x in {OOz, Olz, lOz, llz} for any string 
z that extends Zout unless z corresponds to the leftmost part of x. So the 1/4-Grover search can 
extend the substring by two letters with only one query. If we know that z is a prefix of the oracle 
string, we run the 1/4-Grover search for {zOO, zOl, zlO, zll}. 

The second step is the most technical and it is also essential to implement the third step suc- 
cessfully. A key idea for obtaining the substring appearing only once is relatively simple; extending 
z by its periodic string. For instance, we assume x = 1010110110110111110. Then the substring 
z = 1011011 appears three times in x. The periodic string t of z is t = 101. Let us extend z 
by t as long as possible such that fz is still a substring of x. In the example we get a substring 
t'^z = 1011011011011, which appears only once in x. Now the difficulty is to make the string z 
obtained by the first step as short as possible, which improves the complexity of the algorithm. 
Another key idea for this difficulty is to analyze what happens when z appears twice in x. When 
a substring z with length > N/2 appears twice in x, these occurrences of z must be partially over- 
lapping, and X has a substring uvw such that z = uv = vw. A key property is that the overlapped 
string v is a presuffix of z. Using these key ideas we can construct the algorithm by starting from 
the substring z of length > N/2. 

Now we give an exact algorithm Identify and its subroutine MakeOnce. 
Algorithm Identify 
Input : an S-oracle Os,x- 
Output : the oracle string x. 

Step 1. Find a substring z of length \N/2'] + 1 using Skiena and Sundaram's algorithm |21j . 
Step 2. Run the algorithm MakeOnce on input z. Let Zout be the output. 

Step 3. Repeat extending Zout to the left by 2 letters using the 1/4-Grover search. Check whether 
the extended string is a substring of x after every ^/N applications of the 1/4-Grover search. If 
not, we know that a prefix of x is obtained between the current check point and the previous check 
point. Then, find this prefix by binary search. 

Step 4. Repeat extending the current substring to the right by 2 letters using the 1/4-Grover 
search, and stop when the length of the substring becomes — 1 or iV. If the length is — 1, use 
a classical query to find the last bit. 
(End of Algorithm Identify) 
Algorithm MakeOnce 

Input : a string z (a substring of x, \z\ > N/2); an S-oracle Os,x- 
Output : a substring Zout that appears only once in x. 
Step 1. To := 0. Aq := 0. / := 1. zi := z. {Ai is used for the analysis.) 
Step 2. Repeat Steps 2.1-2.7. 

Step 2.1. Find the shortest string ai satisfying the following conditions. 

(i) ai is a presuffix of zi . 

(ii) The periodic string of ai is not in Tj_i. 

If there is no such string, go to Step 3. Let ti be the periodic string of a/. T; := T;_i U {ti}. 
Ar.= Ai_iU{ai}. 

Step 2.2. Find the largest integer i such that {ti)^zi is also a substring of x. Define z'^ := {tiYzi. 
Step 2.3. Let j be the largest integer such that z'l = utjai for some string u. 
Step 2.4. Let h be the largest integer such that z'^ = t^aiw for some string w. 
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Step 2.5. If z'l = tjai or /i < j, then zi^i := z'l and go to Step 2.7. 

Step 2.6. Find the largest integer k such that u^^z'^ is also a substring of x. Define zi+i := u^z[. 
Step 2.7. I := Z + 1. 

Step 3. Imax ■■= I- Zout ■= zi^^^. T 1= Ti^.^-i. A := (/max, T and ^ are used for the 

analysis.) 

(End of Algorithm MakeOnce) 
2.2 Analysis of MakeOnce 

In this section, we give the analysis of MakeOnce. First, a number of properties are given for the 
analysis. See Appendix |X] for the proof. 

Lemma 1 For any I < Imax, MakeOnce satisfies the following properties. 



1. 


ai & A is represented as ai = tibi, 


where ti £ T and 6/ < \ti\ 


2. 


z'l and ai £ A are prefixes of 




3. 


zi (and hence ai £ A) is a suffix • 


of Zl+l. 


4- 


ai is a presuffix of ai+i and a/+i 


> \ai\. 


5. 


\ai+i\ > \ai\ + \ti\. 




6. 


\ti+i\ > \ti\. 




7. 


At step 2.6, \u\ > \ti\. 




8. 


Imax = 0{VN). 





Now we analyze the query complexity and the correctness of MakeOnce. In what follows, we 
refer to the properties [THS] of Lemma [T] as simply the properties [THll 

Proposition 1 MakeOnce uses at most 0{^/N log N) queries. 

Proof. To obtain Zout, we need queries only at Step 2.2 and Step 2.6. These steps can be 
implemented by binary search to find tlzi and u^'z'i, which use 0(log A^) queries. Since the number 
of repetitions of Step 2 is 0(\/iV) by the property^ the total number of queries is 0{^/Nlog N). 

□ 

Proposition 2 The output Zout of MakeOnce appears exactly once in x. 

Proposition [2] is proved by contradiction. We assume that Zout appears twice in x. Since 
\zout\ > 3; has a substring uvw such that Zout = uv = vw, where \u\ = \w\ > 0. Then we can see 
that V has the following special form. 

Lemma 2 v = f^ai for some I > and m > where ti £ T and ai £ A. 

Proof. First we should notice that Zout has no substring which satisfies the conditions at Step 
2.1 since we go to Step 3 and Zo^t is output only when there is no string satisfying the conditions 
at Step 2.1. On the contrary, v is a presufhx of z^ut^ which means that v satisfies the condition (i) 
of Step 2.1. This implies that v does not satisfy the condition (ii) of Step 2.1. That is, the periodic 
string of v must be ti in T for some I. Hence, it is represented as v = y where / > 0, m' > 0, 
ti £ T, and \y\ < \ti\. 

For ai £ A, let bi be the string such that ai = tibi and \bi\ < \ti\ as guaranteed by the propertylH 
By the property [3l ai is a suffix of Zout- Also, v is a suffix of Zout- Thus y has suffix bi or bi has suffix 
y. Now we show that y = 6/ by contradiction. Assuming \y\ < \bi\, it must hold that tibi = y'tiy 
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for some y' such that \y'\ < \ti\. Then the length of the periodic string of ai is at most which 
contradicts that ti is the periodic string of o;. Assuming |y| > y'tibi = tiy for some y' such 
that \y'\ < \ti\. Then the length of the periodic string of ai is at most \y'\, which also leads to a 
contradiction. 

By the above arguments, v is represented as v = tj^ bi = tf^ai where m = m' — 1. □ 

The main statement for the correctness of MakeOnce is now stated as follows. (In the rest of 
this section, we assume that u, w, u' and w' have positive length.) 

Lemma 3 For any I < Imax, o,ny c < I and m > 0, x has no substring ut^acW such that zi = 
ut^ac = f^acW, tc E Ti^i and ac S j4/_i. 

Then, by the assumption that Zout (= zi^^^) appears twice in x. Lemma [2] implies that x has 
a substring uf^aiw for some < / < Imax — 1 and m > 0, which contradicts Lemma [3l This 
completes the proof of Proposition [2l 

What remains is the proof of Lemma [3l We prove the statement by induction on /. The case of 
/ = 1 is easy. In this case, Tq = Aq = zi = u = w and \zi\ = \z\ > Hence x does not have a 
substring uw = z\Z\. Next we assume that the statement holds for Z, and show that the statement 
holds for Z + 1. For this purpose, we first show the following lemma: 

Lemma 4 //x has vlf^'ac'w' as a substring such that zi^i = u't^'ac' = t^'cc'w' for some c' < I 
and m' > 0, then x also has ut^UcW such that zi = ut^Uc = t^UcW for some c < c' and m > 0. 

Proof. By the property [3l zi is a suffix of zi^i. By the assumption, z/^i appears twice in x, and 
hence zi also appears twice in x. Since \zi\ > N/2, x has a substring uvw with zi = uv = vw. Let 
t be the periodic string of v. Then v is represented as for some > 0, where |6| < |t| and b 
is a prefix of t. Note that \t\ < \tc'\ since f is a suffix of t^ ac'- 

Now we show that there is c < c' such that t = tc and b = be, where be is the string such that 
tcbc = ttc as guaranteed by the property [TJ First we show t £ Ti, which means that t = tc for some 
c < c' by \t\ < \tc'\ and property[6l For contradiction, we assume that t ^ Ti (and hence ^ 
Note that since v satisfies the condition (i) of Step 2.1 for the l-th loop (i.e., is a presufhx of zi), 
tb also satisfies this condition. Then, by t ^ tb satisfies the conditions (i) and (ii) at Step 

2.1. Since ai is the shortest string satisfying the conditions (i) and (ii) at Step 2.1, \tb\ > |a/|. This 
means that is a prefix of tb. Then we have \ti\ < \t\ < \tc'\ and c' < I. This contradicts the 
property [6l Second we show b = be- To this end, it suffices to show |6| = \bc\ because both b and be 
are prefixes of t = tc- Assume that \b\ < \bc\. Then ytcb = tcbc for some y such that \y\ < \tc\ since 
tb = tcb is a suffix of zi and also, by property [3l Oc = tcbc is a suffix of zi. Then, the length of a 
periodic string of Oc is at most \y\, which contradicts that tc is a periodic string of Oc- By a similar 
argument, we also have a contradiction assuming that |6| > \bc\. Thus |6| = |6c|- 

We conclude that tb = tcbc = ac, which completes the proof of Lemma HI □ 

Lemma [4] and the induction hypothesis imply: For any c < / and m > 0, x has no substring 
ut^GcW such that = ut^ac = f^acW, tc G 7] and Oc € A;. We now show another lemma. 

Lemma 5 For any m > 0, x has no substring u'f^aiw' such that zi^i = u'f^ai = f^aiw' . 

Proof. For contradiction, we assume that there is an m > such that x has a substring u'f^aiw' 
satisfying = u'tf^ai = tf^aiw' . Then we lead to a contradiction for all the possible three cases 
at Step 2.5: (1) z'^ = tjai; (2) h < j; (3) the other case. 

In case (1), since tjai = z^+i = u'f^ai, we have m < j and u' = tj Then u'tj^aiw' = t\ ^z[ 
is a substring of x, which contradicts the maximality of i for z[ = t\zi at Step 2.2. 

In case (2), h < j and := z[ = utjai for some u. Note that m < h since h is taken as the 
largest integer such that z'^ = t^aiw for some w at Step 2.4. Thus j > m and hence u'tf^aiw' = 
ut\~^z[. This implies that ut]~^z[ and hence t\~^z[ are included in x, which contradicts the 
maximality of i for z[ = t\zi at Step 2.2. 
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In case (3) where h > j, we take the largest integer k such that u^z'^ = u^^^tjai is a substring 
of X, and let := u'^'^^tjai at Step 2.6. Notice that u does not have suffix ti and \u\ > \ti\ by the 
property [71 This implies that if zi^i = u^~^^tjai has a suffix tj ai then j' < j. By the assumption, 
has tY^di as a suffix, which means m < j. Moreover, we can show that m = j: Since is 
a prefix of z^+i by the property [2l there is a string w" such that zi^i = z[w" . Then x includes 
u'tf^aiw' = u^^^t\~^ z[w" . However, Step 2.2 means that x does not have a tiz[, which implies 
that m = j. Then x have a substring {u^^^tjai)w' = u^^^{t'^aiw') = vf'^^zij^i = v?^^^z'i, which 
contradicts the maximality of k at Step 2.6. □ 

By the above two lemmas, it has been shown that for any c < I + 1 and m > 0, x has no 
substring uf^acW such that z^+i = ut^ac = t^acW, tc G Ti and Oc £ ^i- That is, the statement 
of Lemma [3] for case I + 1 holds under the assumption that it holds for case I. Now the proof of 
Lemma [3] is completed. 

2.3 Analysis of Identify 

First, by following the basic idea described in Section 2.1, the correctness of Identify is easily 
verified. The output Zout of MakeOnce appears in x only once by Proposition [2j This guarantees 
that the 1/4-Grover search can extend z by two letters successfully in Steps 3 and 4 unless the 
current string reaches the left or right end. Moreover, the algorithm knows if the string reaches the 
ends by the regular checking in Step 3 or by the current length in Step 4. 

Second, we analyze the number of queries used in Identify. At Step 1, we find a substring of 
length [-y] + 1 by extending a string by one letter with one query. Then the number of queries at 
Step 1 is +1- At Step 2, the subroutine MakeOnce uses 0(\/iV log A^) queries by Proposition [TJ 
At Steps 3 and 4, we extend a substring of length longer than ^ by two letters with one query. 
Note that the number of checking whether it is a substring of x is 0{^/N). Thus the number of 
queries at Steps 3 and 4 is at most A^/4 + 0(\/iV). Therefore, the total number of queries is at 
most ^ + 0{Vn log N). 

Now the proof of Theorem [T] is completed. 

3 Conclusion 

Obvious future works are a (possible) improvement of the constant factor for the upper bound and 
a challenge to a linear lower bound (we strongly believe there are no sublinear algorithms). For the 
former, one possibility is to exploit a parity computation as was done in [HlIlT]. However, we do 
not have any indication that parity is substantially easier than reconstruction itself for substring 
oracles. For the latter we at least need to get rid of the reduction of Section IB. 21 since we have 
already lost a logA^ factor by that. Different approaches like the polynomial method [4] do exist 
as a possibility, but we have no idea on this direction, either, at this moment. 
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A Proof of Lemma [T] 

[U Let us consider Step 2.1 in the l-th loop. Assuming that ai is chosen at Step 2.1, a; satisfies 
the conditions of Step 2.1. That is, ai is represented as ai = f^bi for some m, where ti is the 
periodic string of a;, ti ^ T;_i, and \bi\ < \ti\. 

Now we show m = 1 by contradiction. Suppose that m > 2. By the definition of the periodic 
string, bi is a prefix of ti. That is, bi is represented as ti = biy for some y. Thus tibi is a prefix of 
ai. Clearly, tibi is also a suffix of a;. Then tfii is a presuffix of zi, because a; satisfies the condition 
(i) of Step 2.1. Since a; satisfies the condition (ii) of Step 2.1, ti is not in Ti-i. Then tibi, which is 
not ai because m > 2, is the shortest string satisfying the conditions of Step 2.1. This contradicts 
the fact that a; is the shortest one. Therefore we have m = 1, that is, ai = tibi. □ 

[2l By the property [H a; = tibi. At Step 2.2, we extend zi to the left by tj. Since bi is a prefix 
of ti, t\ai has ai as a prefix. Thus z[ := t\zi also has ai as a prefix. 

Next, we show that z^+i has z[ as a prefix. There are two cases to determine zij^i: (1) zj+i := z[ 
(at Step 2.5); and (2) := u^z[ (at Step 2.6). In case of (1), it is obvious that z^+i has z[ as a 
prefix. In case of (2), since we go to Step 2.6, h > j. Since z'l = t^aiw, z[ has a prefix of t^ai. Then 
uz[ has ut-'j^ai = z'l as a prefix, that is, uz[ = z[w' for some w' . This implies that u^z'i = z[{w')^ and 
hence u^z[ has prefix. Therefore, z^+i has prefix. Since z[ has ai as a prefix, 2/4.1 also 

has ai as a prefix. □ 

[3l Because we extend zi to the left, z^+i clearly has zi as a suffix. Then ai is also a suffix of 
zi+i- □ 

[U First we show that ai is a presuffix of a^+i. Note that ai is a presuffix of a/+i or a/+i is 
a presuffix of a;, since by the conditions of Step 2.1 and the property [21 both ai and a^+i are 
presuffixes of z/^i. Thus it suffices to assume that |ai+i| < |a/| and lead to a contradiction. Since 
has ai as a presuffix by the properties [2] and [3l a/^i is a presuffix of a^. This implies that a/_|_i 
is a presuffix of z/. Then a;+i satisfies the conditions of Step 2.1 during /-th loop. This contradicts 
the fact that ai is such the shortest string. 

Second we show that |a/+i| 7^ \ai\ by contradiction, which implies |ai+i| > \ai\. Assuming 
|a;+i| = \ai\, a;_|_i = ai by the property [2j Then = ti. However, t^+i is not in T/ by the 
condition (ii) at Step 2.1, which also leads to a contradiction. □ 

[5l By the property HI ai is a presuffix of a/+i and |a/+i| > \ai\. Thus there are some y and 
y' with |?/| = \y'\ > such that a;+i = a^y = y'a/. Now it suffices to show \y'\ > \ti\. Assuming 
\y'\ < \ti\, the length of the periodic string of a/ is at most \y'\. This contradicts the fact that ti is 
the periodic string of a;. □ 

[H By contradiction. First we assume < \ti\. Noting that a/ is a prefix of a/+i by the 

property m ai is represented as a/ = tY\_^b for some integer m and string b. Then the length of the 
periodic string of a/ should be at most This contradicts the fact that ti is the periodic string 
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of a/. Second, assume that = \ti\. This means that ti = t^+i while ij+i is not in Ti by the 

condition (ii) at Step 2.1, which is also a contradiction. □ 

[71 We need to consider the following two cases at Step 2.6: (1) h > j and (2) h = j. In case of 
(1), \z'i\ = /ilt/l + la^l + l-w;! = \u\+j\ti\ + \ai\. Thus = {h - j - l)\ti\ + \w\ > 0. In case of (2), 

z'l = uPai = t^aiw. Now we show that \u\ > \ti\ by contradiction. Assuming that \u\ = \ti\, we have 
u = ti since z'l has ti as a prefix. This implies that z'l = tj~^^ai, which contradicts the definition of 
j at Step 2.3. Assuming that |m| < the length of the periodic string of z'^ is at most \u\. This 
implies that the length of the periodic string of ai is also at most \u\, which contradicts that ti is 
the periodic string of a;. □ 

El By the properties and El N > |az„„^_i| > El=r"^l*/I ^ i^max - 2)^2- This implies 

Imax = 0{VN). □ 



B Proof of Theorem [2] 

Our proof consists of two steps. First, we introduce another oracle model (the AS-oracle) similar to 
the S-oracle and show a lower bound for the query complexity of identifying an AS-oracle. Secondly, 
we reduce the identification problem for an AS-oracle to the identification problem for an S-oracle, 
with some overhead. 

To show the lower bound for AS-oracles, we now revisit one version of the (nonnegative) quan- 
tum adversary method, called the strong weighted adversary method in [23], due to Zhang [25]. Let / 
be a function from a finite set S to another finite set S' . The goal is to compute f{x), where x G S is 
the input. In the query complexity model, the input x is given as an oracle. More precisely, suppose 
that the oracle Ox corresponding to x is the unitary transformation Ox\q,a,z) = \q,a(B C{^] o)-,^)-, 
where |g) is the register for a query string q from a finite set Q, |o) is the register for the binary 
answer C,[x; q) and \z) is the work register. Here C, is some function from S xQ to {0, 1}. Then the 
strong adversary method is restated as follows: 

Lemma 6 Let w, w' denote a weight scheme as follows: 

1. Every pair {x,y) £ S x S is assigned a nonnegative weight w{x,y) = w{y,x) that satisfies 
w{x,y) = whenever f{x) = f{y). 

2. Every triple {x,y,q) G S x S x Q is assigned a nonnegative weight w'{x,y,q) that satisfies 
w'{x, y,q) = whenever ({x; q) = C{y', q) or f{x) = f{y), and w'{x, y, q)w'{y, x, q) > w'^{x, y) 
for all X, y, q such that C(x; q) / CiVi q) arid f{x) / f{y). 

For allx,q, let fi{x) = ^yW{x,y) and i^{x,q) = Yly'^' {x,y,q). Then, the quantum query complex- 
ity of f is at least 

( 







max mm 



/ lJ'ix)n{y) 

x,y,q, Z{x,y) > 0, V v{x, q)v{y, q) 

\ Q{x;q)^ay\q) 



B.l New Oracle Model - Anchored Substring Oracle 

To prove Theorem [2l we introduce another oracle model similar to the S-oracle. We call it an 
anchored substring oracle or, simply, an AS-oracle. 



Definition 2 An AS-oracle is a binary string X = (xq, xi, . . . , xn~i)- A query to the AS-oracle 
is a pair of an index and a string q = (i, s) £ {0, 1, . . . , — 1} x {0, 1}* . The answer from the 
AS-oracle is the binary value r(X; q) defined as follows: If the substring XjXj+i • • • j;j_|_|s|„i of X is 
equal to s then T{X;q) = 1, otherwise T{X;q) = 0. 
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We give the following lower bound. 

Lemma 7 The quantum query complexity for identifying an AS-oracle is Q (^j^^^ 

Proof. Let us assume is divisible by [log N~\ (generalization is easy and omitted) . We regard 

N 

the oracle string X G {0, 1}^ as X = (Xi, X2, . . .,X n ) G [M]TE^, where Xj G {0, l}ri°g^l 

I log Af I 

for j = 1, • • • , and M = 2^^"sN] ^ Yot any instances X, Y, we define D{X, Y) := \{i\Xi / Yi}\ 

and call this quantity the block distance between X and Y. 

We now define sets Q{a, I, b) of queries to the AS-oracle for any a, 6 G {0, ... , [log A^] — 1} and 
any / G {0, . . . , N/ [log A^] } such that (a, /) / (0, 0). If a / we define 

Q{a, 0, 0) = {q = {i, s) \ \s\ = a and j ■ [log A^] < i < a + z < (j + 1) • [log A^] for some j}. 

A query q G Q{a, 0, 0) reads a letters in the same block. If / > or a6 > 0, then we define 

Q{a, l,b) = {q = {i, s) \ \s\ = a + I ■ [log N~\ + b and a + i = (mod [log A^] ). 

A query q G Q{a, I, b) reads, for some index j, the last a letters in block Xj-i, all the letters in the 
I blocks Xj to Xj^i^i, and the first b letters in block Xj^i. 

To use Lemma 1 let S = {0,1}^, Q = Ua.i.fe ^ C{X;q) = T{X;q), and f{X) = X. 
Now we give a weight scheme. For any pair (X, y) G x 5, let w{X,Y) = 1 if D{X,Y) = 1 and 
u;(Ar, y) = otherwise. For any X, Y and q G Q(a, Z, 6), we set the weight w'{X, Y, q) as follows. If 
D{X, y) / 1 or t{X; q) = t{Y; q), we set w'{X, Y, q) = 0. Otherwise: 

1. If the index j such that Xj ^ Yj represents one of the / blocks covered in the part read by 
query q, then tt;'(Ar, y, = 1. 

2. If the index j such that Xj 7^ Yj represents a block in which query q reads a letters or b letters, 
then w'{X, Y,q) =1 + 1 when t{X; q) = I (and then t{Y; q) = 0) and w'{X, Y, q) = when 
T{X;q)=0. 

It is easy to check that this satisfies the conditions of a weight scheme. Then, for any X, we have 

by M = 2^^°sN] ^ 0(^)^ need to evaluate i^{X, q)i^{Y, q) for pairs {X, Y) such that t{X; q) = 1 
and T{Y;q) = 0, or T{X;q) = and T{Y;q) = 1. By symmetry, we only consider the case where 
t{X, q) = 1 and t{Y, q) = 0. Then, 

i.(X,g) = (M-^).(/ + l) + /.(M-l) + (M-^).(/ + l)<3(/ + l)M. 

The quantity ^{Y, q) is 1 or 2^.(1^1) or 2b.{i+i) ■ -'-^^ °^ these three cases, it satisfies the inequality 

M 

iy{Y,q) < - — - (since I < A^/[logAf] < M). 
^ H~ 1 

Hence, iy{X,q)i^{Y,q) < 3M^ = 0{N^). 

By Lemma [6] the quantum query complexity of identifying an AS-oracle is at least 



n 



mm 



x,Y,q, w(X,Y)>o, V iy{X,q)i'{Y,q) j VlogA^ 

\ T(X;q)^T{Y;q) 
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B.2 Reduction 



We prove the lower bound for identifying an S-oracle (Theorem [2]) by a reduction from the problem 
of identifying an AS-oracle. 

We show how to embed an AS-oracle string of size N into an S-oracle string. For any AS-oracle 
string X = xqXi ■ ■ ■ X]\f^2XN-~i, we construct the following S-oracle string X' € {0, 

X' = B{0)B{of^xoUB{l)B{l)'^^xm ■■■B{N- 1)B{N - 1) ^jv-i, 
where B{i) is the binary representation of index i, B{i)^ is the reverse string of B{i), and ft = 

j^lOlogAT 

First, we can easily see that a query to the AS-oracle string X is embedded into a query to the 
S-oracle string X': For a query {i, zi ■ ■ ■ Zm) to X , the corresponding query to X' is 

B{i)B{{)^^zinB{i + l)B{i + lfU2n ■■■B{i + m- l)B{i + m- if^z^. 

Second, we show that any query s to the S-oracle string X' is useless (i.e., the answer is 
independent of X') or corresponds to a query to the AS-oracle X. Assume that s is not useless. 
One can consider the following two cases. 

1. The string s is "long": In this case, the query string includes B{i) or B{i)^, and hence which 
part of X' is referred by the query is determined. Thus the query string to the S-oracle 
corresponds to a query string to the AS-oracle. For example, if 

s = B{lf^ziUB{2)B{2)'^^Z2m{^), 
this query corresponds to the query string (1, Z1Z2) to the AS-oracle. 

2. The string s is "short": In this case, we cannot determine which part of X' is referred uniquely 
because s corresponds to only a part of B(i) or B{i)^. But even for such a case, we still obtain 
the higher bits of B{iys by our construction of X'. For example, let us consider a query string 
00(11. Note that the two bits 00 of this query corresponds to one of the indexes i of the query 
of the AS-oracle such that the highest two bits of i is 00. Then, the query string 00(J1 indicates 
whether at least one bit from xq to X{^/^_i is 1 or not. Thus, this query corresponds to the 
query (0,0^/^) of the AS-oracle. 

By the above arguments, we can reduce identifying the AS-oracle to identifying the S-oracle 
with a O(logA^) factor. By Lemma [71 the lower bound for the S-oracle is then ^ ^^^^ ^ , which 
completes the proof of Theorem [2j 

C Proof of Theorem [3] 

We prove by using the adversary method. We define that Q>l (resp. Q<l) is the set of query 
strings q such that \q\ > L (resp. \q\ < L). Let Li := Slog and L2 := log — 1 — 2 log log A^. 
Our weight scheme is as follows: For any pair (x, y) G S* x 5" such that x ^ y, let w{x, y) = 1. For 
any triple {x,y,q) £ S x S x {Q>Li ^Q<L2) such that xi^^Q) / xiViQ)-, let w'{x,y,q) = 1. 

It is easy to check that this satisfies the conditions of a weight scheme. Then, for any x, we 
have fi{x) = ^^yWlxjy) = 2^ — 1. For evaluating u{x,q)u{y,q), we only consider the case where 
xix; q) = I and x(y; = by symmetry. Now u{x, q) (resp. i'{y, q)) means the number of instances 
y (resp. x) such that xiy^Q) = (resp. xi^'^Q) = !)• 

When|g| > Li, it holds that i/(x, < 2^ and i/(?/, g) < (A-|g| + l)-2(^-l5l) < A-2(^-^i) = j^. 
The value of i^{y,q) is obtained by considering the case that Xi • • • Xi^^q^^i is equal to q for some 
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i. When |g| < Lg, it holds that u{x,q) < (2l9l - 1)M = 2^ • (1 - 2^)1^ < 2^ • (i)^^, and 
T^iy^O) < 2^. The value v{x,q) is obtained by dividing the instance into N/\q\ blocks of length |g| 



and considering that none of the blocks are equal to q. Evaluating the value 



l9l 



< 



1\ 2^-21,2 



< 



JV-2 log^ JV 
AT log AT 



1 

w 



Hence for ah q G Q>l^ UQ<l2, i^{x,q)u{y,q) < ^ 
is at least 



mm 

x,y,q,w{x,y) > 0, 
\ \q\ > Li and \q\ < L2 



By Lemma E] the quantum query complexity 

= n{N). 



v{x,q)Ty{y,q) 



J 
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